TensorFlow Basics
R Interfaces to TensorFlow
Deep Learning
Supporting Tools
Deployment
Learning More
TensorFlow Basics
R Interfaces to TensorFlow
Deep Learning
Supporting Tools
Deployment
Learning More
Rapid development, many outside contributors
Use in ML classes is growing: 9 out of top 14 online deep learning courses use TensorFlow
Robust foundation for machine learning and deep learning applications
TensorFlow models can be deployed with a low-latency C++ runtime
R has a lot to offer as an interface language for TensorFlow
# Greta theta = normal(0, 32, dim = 2) mu <- alpha + beta * Z X = normal(mu, sigma) p <- ilogit(theta[1] + theta[2] * X) distribution(y) = binomial(n, p)
# BUGS/JAGS
for(j in 1 : J) {
y[j] ~ dbin(p[j], n[j])
logit(p[j]) <- theta[1] + theta[2] * X[j]
X[j] ~ dnorm(mu[j], tau)
mu[j] <- alpha + beta * Z[j]
}
theta[1] ~ dnorm(0.0, 0.001)
theta[2] ~ dnorm(0.0, 0.001)
| Dimension | R object |
|---|---|
| 0D | 42 |
| 1D | c(42, 42, 42) |
| 2D | matrix(42, nrow = 2, ncol = 2) |
| 3D | array(42, dim = c(2,3,2)) |
| 4D | array(42, dim = c(2,3,2,3)) |
Vector data—2D tensors of shape (samples, features)
Timeseries or sequence data—3D tensors of shape (samples, timesteps, features)
Images—4D tensors of shape (samples, height, width, channels)
Video—5D tensors of shape (samples, frames, height, width, channels)
head(data.matrix(iris), n = 10)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species [1,] 5.1 3.5 1.4 0.2 1 [2,] 4.9 3.0 1.4 0.2 1 [3,] 4.7 3.2 1.3 0.2 1 [4,] 4.6 3.1 1.5 0.2 1 [5,] 5.0 3.6 1.4 0.2 1 [6,] 5.4 3.9 1.7 0.4 1 [7,] 4.6 3.4 1.4 0.3 1 [8,] 5.0 3.4 1.5 0.2 1 [9,] 4.4 2.9 1.4 0.2 1 [10,] 4.9 3.1 1.5 0.1 1
You define the graph in R
Graph is compiled and optimized
Graph is executed (in part, or fully) on devices
Nodes represent computations
R Code
TensorFlow Graph
High-level R interfaces for neural nets and traditional models
Low-level interface to allow enable new applications (e.g. Greta)
Tools to faciliate productive workflow / experiment management
Easy access to GPUs for training models
Breadth and depth of educational resources
High-level neural networks API capable of running on top of TensorFlow, CNTK, or Theano (and soon MXNet).
Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility).
Supports both convolutional networks and recurrent networks, as well as combinations of the two.
Runs seamlessly on CPU and GPU.
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu',
input_shape = input_shape) %>%
layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 10, activation = 'softmax')
library(keras) # Load MNIST images datasets (built in to Keras) c(c(x_train, y_train), c(x_test, y_test)) %<-% dataset_mnist() # Flatten images and transform RGB values into [0,1] range x_train <- array_reshape(x_train, c(nrow(x_train), 784)) x_test <- array_reshape(x_test, c(nrow(x_test), 784)) x_train <- x_train / 255 x_test <- x_test / 255 # Convert class vectors to binary class matrices y_train <- to_categorical(y_train, 10) y_test <- to_categorical(y_test, 10)
model <- keras_model_sequential() %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 10, activation = 'softmax')
model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
summary(model)
_____________________________________________________________________________________ Layer (type) Output Shape Param # ===================================================================================== dense_1 (Dense) (None, 256) 200960 _____________________________________________________________________________________ dropout_1 (Dropout) (None, 256) 0 _____________________________________________________________________________________ dense_2 (Dense) (None, 128) 32896 _____________________________________________________________________________________ dropout_2 (Dropout) (None, 128) 0 _____________________________________________________________________________________ dense_3 (Dense) (None, 10) 1290 ===================================================================================== Total params: 235,146 Trainable params: 235,146 Non-trainable params: 0 _____________________________________________________________________________________
history <- model %>% fit( x_train, y_train, batch_size = 128, epochs = 30, validation_split = 0.2 )
history
Trained on 48,000 samples, validated on 12,000 samples (batch_size=128, epochs=30)
Final epoch (plot to see history):
acc: 0.9057
loss: 1.5
val_acc: 0.9317
val_loss: 1.088
plot(history)
model %>% evaluate(x_test, y_test)
$loss [1] 0.1078904 $acc [1] 0.9815
model %>% predict_classes(x_test[1:100,])
[1] 7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 [36] 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 [71] 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6 9
| Estimator | Description |
|---|---|
| linear_regressor() | Linear regressor model. |
| linear_classifier() | Linear classifier model. |
| dnn_regressor() | Dynamic nueral network regression. |
| dnn_classifier() | Dynamic nueral network classification. |
| dnn_linear_combined_regressor() | DNN Linear Combined Regression. |
| dnn_linear_combined_classifier() | DNN Linear Combined Classification. |
W <- tf$Variable(tf$random_uniform(shape(1L), -1.0, 1.0))
b <- tf$Variable(tf$zeros(shape(1L)))
y <- W * x_data + b
loss <- tf$reduce_mean((y - y_data) ^ 2)
optimizer <- tf$train$GradientDescentOptimizer(0.5)
train <- optimizer$minimize(loss)
sess = tf$Session()
sess$run(tf$global_variables_initializer())
for (step in 1:200) {
sess$run(train)
if (step %% 20 == 0)
cat(step, "-", sess$run(W), sess$run(b), "\n")
}
Deep learning was a principle motivator for the creation of TensorFlow
What is deep learning and what is it useful for?
How is deep learning relevant to the R community?
Special thanks to François Chollet (creator of Keras) for the concepts and figures used to explain deep learning! (all drawn from Chapter 1 of his Deep Learning with R book)
Input data points (e.g. images)
Labels (e.g. cat vs. dog)
Model architecture (e.g. logistic regression)
Loss function (measures performance of model, e.g. categorical crossentroopy)
Loss is used as a feedback signal to an optimizer (e.g. stochastic gradient descent) to iteratively adjust the model parameters
Training consists of feeding batches of data to training process to learn the optimal model parameters.
Machine-learning algorithms aren’t usually creative in finding these transformations; they’re merely searching through a predefined set of operations, called a hypothesis space.
A new take on learning representations from data that puts an emphasis on learning successive layers of increasingly meaningful representations.
Other possibly more appropriate names for the field:
Layered representations learning
Hierarchical representations learning
Modern deep learning often involves tens or even hundreds of successive layers of representation
Other approaches to machine learning tend to focus on learning only one or two layers of representations of the data
Layers perform transformations using weights, but how do we learn the weights?
The loss function takes the predictions of the network and the true targets (what you wanted the network to output) and computes a distance score, capturing how well the network has done on a batch of examples.
The fundamental trick in deep learning is to use this score as a feedback signal to adjust the value of the weights a little, in a direction that will lower the loss score for the current batch of examples. This adjustment is the job of the optimizer.
Updates are done via the backpropagation algorithm, using the using the chain rule (from calculus) to iteratively compute gradients for each layer.
The training loop, which, repeated a sufficient number of times (typically tens of iterations over thousands of examples), yields weight values that minimize the loss function.
Hardware (NVIDIA GPUs)
Data (internet enabled collection and distribution of very large datasets)
Various algorithmic advances enabled training models with 10 or more layers
Research now proceeding at a furious pace!
Visual recognition
Language translation
Time series
Biology
What next?
ImageNet: 3.2 million labelled images, separated into 5,247 categories, sorted into 12 subtrees like “mammal,” “vehicle,” and “furniture.”
ImageNet Challenge: an annual competition (2010-2017) to see which algorithms could identify objects in the dataset’s images with the lowest error rate.
Accuracy improved from 71.8% to 97.3% over the lifetime of the contest.
Deep learning was used for the first time in 2012, and beat the field by 10.8%.
Google's Neural Machine Translation System
# use a pretrained model as the base of a new classifier conv_base <- application_vgg16( weights = "imagenet", include_top = FALSE, input_shape = c(150, 150, 3) ) model <- keras_model_sequential() %>% conv_base %>% layer_flatten() %>% layer_dense(units = 256, activation = "relu") %>% layer_dense(units = 1, activation = "sigmoid")
# stacked recurrent layers
model <- keras_model_sequential() %>%
layer_gru(units = 32, dropout = 0.1, recurrent_dropout = 0.5,
return_sequences = TRUE, input_shape = list(NULL, dim(data)[[-1]])) %>%
layer_gru(units = 64, dropout = 0.1, recurrent_dropout = 0.5,
activation = "relu") %>%
layer_dense(units = 1)
# bidirectional recurrent layers
model <- keras_model_sequential() %>%
bidirectional(
layer_gru(units = 32), input_shape = list(NULL, dim(data)[[-1]])
) %>%
layer_dense(units = 1)
An autoencoder is a neural network that is used to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction.
For this problem we train an autoencoder to encode non-fraud observations from our training set.
Since frauds are supposed to have a different distribution then normal transactions, we expect that our autoencoder will have higher reconstruction errors on frauds then on normal transactions.
Successful deep learning requires a huge amount of experimentation
This requires a systematic approach to conducting and tracking the results of experiements
The training_run() function is like the source() function, but it automatically tracks and records output and metadata for the execution of the script:
library(tfruns)
training_run("mnist_mlp.R")
ls_runs()
Data frame: 4 x 28
run_dir eval_loss eval_acc metric_loss metric_acc metric_val_loss metric_val_acc
1 runs/2017-12-09T21-01-11Z 0.1485 0.9562 0.2577 0.9240 0.1482 0.9545
2 runs/2017-12-09T21-00-11Z 0.1438 0.9573 0.2655 0.9208 0.1505 0.9559
3 runs/2017-12-09T19-59-44Z 0.1407 0.9580 0.2597 0.9241 0.1402 0.9578
4 runs/2017-12-09T19-56-48Z 0.1437 0.9555 0.2610 0.9227 0.1459 0.9551
ls_runs(eval_acc > 0.9570, order = eval_acc)
Data frame: 2 x 28
run_dir eval_acc eval_loss metric_loss metric_acc metric_val_loss metric_val_acc
1 runs/2017-12-09T19-59-44Z 0.9580 0.1407 0.2597 0.9241 0.1402 0.9578
2 runs/2017-12-09T21-00-11Z 0.9573 0.1438 0.2655 0.9208 0.1505 0.9559
# define flags and their defaults
FLAGS <- flags(
flag_integer("dense_units1", 128),
flag_numeric("dropout1", 0.4),
flag_integer("dense_units2", 128),
flag_numeric("dropout2", 0.3)
)
# use flag layer_dropout(rate = FLAGS$dropout1)
# train with flag
training_run("mnist_mlp.R", flags = list(dropout1 = 0.3))
# run various combinations of dropout1 and dropout2
runs <- tuning_run("mnist_mlp.R", flags = list(
dropout1 = c(0.2, 0.3, 0.4),
dropout2 = c(0.2, 0.3, 0.4)
))
# find the best evaluation accuracy runs[order(runs$eval_acc, decreasing = TRUE), ]
Data frame: 9 x 28
run_dir eval_loss eval_acc metric_loss metric_acc metric_val_loss metric_val_acc
9 runs/2018-01-26T13-21-03Z 0.1002 0.9817 0.0346 0.9900 0.1086 0.9794
6 runs/2018-01-26T13-23-26Z 0.1133 0.9799 0.0409 0.9880 0.1236 0.9778
5 runs/2018-01-26T13-24-11Z 0.1056 0.9796 0.0613 0.9826 0.1119 0.9777
4 runs/2018-01-26T13-24-57Z 0.1098 0.9788 0.0868 0.9770 0.1071 0.9771
2 runs/2018-01-26T13-26-28Z 0.1185 0.9783 0.0688 0.9819 0.1150 0.9783
3 runs/2018-01-26T13-25-43Z 0.1238 0.9782 0.0431 0.9883 0.1246 0.9779
8 runs/2018-01-26T13-21-53Z 0.1064 0.9781 0.0539 0.9843 0.1086 0.9795
7 runs/2018-01-26T13-22-40Z 0.1043 0.9778 0.0796 0.9772 0.1094 0.9777
1 runs/2018-01-26T13-27-14Z 0.1330 0.9769 0.0957 0.9744 0.1304 0.9751
Scalable training of models built with the keras, tfestimators, and tensorflow R packages.
On-demand access to training on GPUs, including the new Tesla P100 GPUs from NVIDIA®.
Hyperparameter tuning to optmize key attributes of model architectures in order to maximize predictive accuracy.
Train on default CPU instance:
library(cloudml)
cloudml_train("mnist_mlp.R")
Automatically uploads contents of working directory along with script
Automatically installs all required R packages on CloudML servers
# Train on a GPU instance
cloudml_train("mnist_mlp.R", master_type = "standard_gpu")
# Train on an NVIDIA Tesla P100 GPU
cloudml_train("mnist_mlp.R", master_type = "standard_p100")
job_collect()
Collects job metadata and all files created by the job (e.g. event logs, saved models)
Uses tfruns to allow inspection, enumeration, and comparision of jobs
ls_runs()
Data frame: 6 x 37
run_dir eval_loss eval_acc metric_loss metric_acc metric_val_loss metric_val_acc
6 runs/cloudml_2018_01_26_135812740 0.1049 0.9789 0.0852 0.9760 0.1093 0.9770
2 runs/cloudml_2018_01_26_140015601 0.1402 0.9664 0.1708 0.9517 0.1379 0.9687
5 runs/cloudml_2018_01_26_135848817 0.1159 0.9793 0.0378 0.9887 0.1130 0.9792
3 runs/cloudml_2018_01_26_135936130 0.0963 0.9780 0.0701 0.9792 0.0969 0.9790
1 runs/cloudml_2018_01_26_140045584 0.1486 0.9682 0.1860 0.9504 0.1453 0.9693
4 runs/cloudml_2018_01_26_135912819 0.1141 0.9759 0.1272 0.9655 0.1087 0.9762
# ... with 30 more columns:
# flag_dense_units1, flag_dropout1, flag_dense_units2, flag_dropout2, samples, validation_samples,
# batch_size, epochs, epochs_completed, metrics, model, loss_function, optimizer, learning_rate,
# script, start, end, completed, output, source_code, context, type, cloudml_console_url,
# cloudml_created, cloudml_end, cloudml_job, cloudml_log_url, cloudml_ml_units, cloudml_start,
# cloudml_state
FLAGS <- flags(
flag_integer("dense_units1", 128),
flag_numeric("dropout1", 0.4),
flag_integer("dense_units2", 128),
flag_numeric("dropout2", 0.3)
)
model <- keras_model_sequential() %>%
layer_dense(units = FLAGS$dense_units1, activation = 'relu',
input_shape = c(784)) %>%
layer_dropout(rate = FLAGS$dropout1) %>%
layer_dense(units = FLAGS$dense_units2, activation = 'relu') %>%
layer_dropout(rate = FLAGS$dropout2) %>%
layer_dense(units = 10, activation = 'softmax')trainingInput:
hyperparameters:
goal: MAXIMIZE
hyperparameterMetricTag: acc
maxTrials: 10
params:
- parameterName: dropout1
type: DOUBLE
minValue: 0.2
maxValue: 0.6
scaleType: UNIT_LINEAR_SCALE
- parameterName: dropout2
type: DOUBLE
minValue: 0.1
maxValue: 0.5
scaleType: UNIT_LINEAR_SCALEcloudml_train("minst_mlp.R", config = "tuning.yml")
job_trials()
finalMetric.objectiveValue finalMetric.trainingStep hyperparameters.dropout1 hyperparameters.dropout2 trialId 1 0.973854 19 0.2011326172916916 0.32774705750441724 10 2 0.973458 19 0.20090378506439671 0.10079321757280404 3 3 0.973354 19 0.5476299090261757 0.49998941144858033 6 4 0.972875 19 0.597820322273044 0.4074512354566201 7 5 0.972729 19 0.25969787952729828 0.42851076497180118 1 6 0.972417 19 0.20045494784980847 0.15927383711937335 4 7 0.972188 19 0.33367593781223304 0.10077055587860367 5 8 0.972188 19 0.59880072314674071 0.10476853415572558 9 9 0.972021 19 0.40078175292512 0.49982245025905447 8 10
job_collect(trials = "best")
Collects trial with best objective metric by default
Specify trials = "all" to collect all trials (runs) and then perform offline anaysis of hyperparameter interactions via ls_runs().
Attend Michael Quinn's Large scale machine learning using TensorFlow, BigQuery and CloudML Engine talk after the break (interop track) to learn more.
Content details will be announced at the talk!
TensorFlow was built from the ground up to enable deployment using a low-latency C++ runtime.
Deploying TensorFlow models requires no runtime R or Python code.
Key enabler for this is the TensorFlow SavedModel:
SavedModel provides a language-neutral format to save machine-learned models that is recoverable and hermetic. It enables higher-level systems and tools to produce, consume and transform TensorFlow models.
TensorFlow models can be deployed to servers, embedded devices, mobile phones, and even to a web browser!
model <- keras_model_sequential( %>% )
layer_dense(units = 256, activation = 'relu', input_shape = c(784),
name = "image") %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dense(units = 10, activation = 'softmax',
name = "prediction")
Note that we give the input and output layers names ("image" and "prediction")
# ...compile and fit model # export model library(tfdeploy) export_savedmodel(model, "savedmodel")
serve_savedmodel("savedmodel")
TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments.
TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs.
TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.
library(rsconnect)
deployTFModel("savedmodel", account = <username>, server = <internal_connect_server>)
library(cloudml)
cloudml_deploy("savedmodel", name = "keras_mnist")
Copying file://savedmodel/variables/variables.data-00000-of-00001 ... Copying file://savedmodel/saved_model.pb ... Copying file://savedmodel/variables/variables.index ... / [3/3 files][ 1.9 MiB/ 1.9 MiB] 100% Done Operation completed over 3 objects/1.9 MiB. Model created and available in https://console.cloud.google.com/mlengine/models/keras_mnist
Run on embedded devices (e.g. Raspberry Pi)
Keras models can be converted to iOS CoreML
Keras models can be deployed to the browser with Keras.js
Attend Javier Luraschi's Deploying TensorFlow models with tfdeploy talk after the break (interop track) to learn more!
Straightforward to load already trained models into Shiny applications.
Training code (and hardware!) not required for inference.
model <- load_model_hdf5("model.hdf5") # Keras
model %>% predict(input)
predict_savedmodel(input, "savedmodel") # SavedModel
One catch: data fed to deployed models must be preprocessed (e.g. normalized) in the same fashion as it was preprocessed for training.
Ideally all preprocessing would happen within the deployed model (i.e. within the TensorFlow graph/runtime) so clients could pass raw data (e.g. text, images, time-series data, etc.)
The tfestimators package has a solution for this (Feature Columns) however more work is still required for Keras.
Clients can typically find a way to do the preprocessing but it's often awkward and inconvenient. We'd like to make it much more straightforward and automatic.
Recommended reading
Keras for R cheatsheet
Gallery and examples
Subscribe to the TensorFlow for R blog!
Slides: https://beta.rstudioconnect.com/ml-with-tensorflow-and-r/
TensorFlow for R: https://tensorflow.rstudio.com
Stay up to date at: https://tensorflow.rstudio.com/blog/
# Modify model object in place (note that it is not assigned back to)
model %>% compile(
optimizer = 'rmsprop',
loss = 'binary_crossentropy',
metrics = c('accuracy')
)
Keras models are directed acyclic graphs of layers whose state is updated during training.
Keras layers can be shared by mutliple parts of a Keras model.